2_Sequence_bioinformatics

Master Medical Biometry/Biostatistics, Introduction to Bioinformatics, Medizinische Fakultät Heidelberg


The purpose of these exercises is to introduce you to common procedures in bionformatics using webtools using HIV as a case study.

Provide answers to questions that are marked with ‘Q’


Biological background

This is an animation describing in a simple manner the life cycle of the HIV virus and explains how the virus may be battled through inhibition of critical mechanisms.

Tools

For many of the following exercises we will make use of programs that are part of the EMBOSS package (The European Molecular Biology Open Source Software Suite).

Translation of nucleotide sequences

  • Program: sixpack - It translates a nucleotide sequence into its six possible reading frames
  • Data: gag_mrna.fa - mRNA of the GAG gene

Here we will identify possible peptides (or proteins) from the mRNA of the GAG gene.

  • Locate the program and run using default parameters
    • The program will create two output files in the same page: outfile and outseq
    • Look at the amino acid sequences for the first 120 nucleotides in the sixpack outfile


Q1. Which of the reading frames is likely to encode a protein? How could you tell?

Click here for answer
F1 is the most likely to be translated since there are
no stop codons that interrupt the sequence


This is the GAG protein:

>Gag_protein gi|2801504|gb|AAC82593.1| Gag [Human immunodeficiency virus 1]
MGARASVLSGGELDRWEKIRLRPGGKKKYKLKHIVWASRELERFAVNPGLLETSEGCRQILGQLQPSLQT
GSEELRSLYNTVATLYCVHQRIEIKDTKEALDKIEEEQNKSKKKAQQAAADTGHSNQVSQNYPIVQNIQG
QMVHQAISPRTLNAWVKVVEEKAFSPEVIPMFSALSEGATPQDLNTMLNTVGGHQAAMQMLKETINEEAA
EWDRVHPVHAGPIAPGQMREPRGSDIAGTTSTLQEQIGWMTNNPPIPVGEIYKRWIILGLNKIVRMYSPT
SILDIRQGPKEPFRDYVDRFYKTLRAEQASQEVKNWMTETLLVQNANPDCKTILKALGPAATLEEMMTAC
QGVGGPGHKARVLAEAMSQVTNSATIMMQRGNFRNQRKIVKCFNCGKEGHTARNCRAPRKKGCWKCGKEG
HQMKDCTERQANFLGKIWPSYKGRPGNFLQSRPEPTAPPEESFRSGVETTTPPQKQEPIDKELYPLTSLR
SLFGNDPSSQ
  • Locate it in the sixpack outseq


Q2. To which ORF does it match? How many other ORFs were predicted from the GAG mRNA?

Click here for answer
It matches ORF1 and there are 142 predicted ORFS


BLAST and homologues to HIV proteins

Acquired Immune Deficiency Syndrome (AIDS) is caused by two closely related variants of Human Immunodeficiency Virus one (HIV-1) and Human Immunodeficiency Virus two (HIV-2). HIV-1 is responsible for the global pandemic, while HIV-2 has, until recently, been restricted to West Africa and appears to be less virulent in its effects. Viruses related to HIV have been found in many species of non-human primates (monkeys, apes, …) and have been named Simian Immunodeficiency Virus, SIV.

  • Program: BLAST - Identifies homologous sequences (nucleotides, aminoacids) from a query sequence (nucleotide, aminoacid)
  • Data: rev_prot.fa - this is the reverse transcriptase protein in HIV

Here we will identify similar proteins to the HIV reverse transcriptase in other organisms.

  • Go to the BLAST webpage
    • Select the correct “flavor” of BLAST
    • Use the reverse transcriptase protein as query
    • Select UniProtKB/Swiss-Prot as database
    • Click BLAST


Once the analysis finishes * Go to the Graphic summary tab * There you can see the different hits found and their corresponding alignment score


Q3. What is the lowest alignment score? What is the description of the protein?

Click here for answer
Score = 44, with an e-value of 5.2e-06
that corresponds to Accession:P27971.1

RecName: Full=Protein Rev; AltName: 
Full=Regulator of expression of viral proteins 
[Simian immunodeficiency virus (AGM155 ISOLATE)]

This is not a reverse transcriptase, 
but a protein that has the motif


  • Go to the Taxonomy tab


Q4. Are you able to find REV proteins from human (HIV-2) and monkey (SIV)?

Click here for answer
5 SIVz sequences:SIVcpz MB66, EK505, GAB1, TAN1 and 
Simian immunodeficiency virus (AGM155 ISOLATE)

no HIV2; however if we allow for more hits we will
find them as they are evolutionary related to HV1,
but quite distant 


Multiple alignments - Origin of HIV

  • Program: Clustal Omega, the web interface of a multiple sequence alignment program

  • Data: env_prot.fa - file with 13 different protein sequences from isolates of HIV1, HIV2, chimpanzee (SIVCZ) and macaque monkey (SIVM1 and SIVML)

  • Go to the Clustal Omega webpage

    • Upload the proteins file (or paste the sequences)
    • Select protein as format
    • Run with default parameters

Once you have the results * Note the order of the sequences


  • Select Phylogenetic Tree
    • At the bottom of the page you will see the phylogenetic tree (evolutionary order) of your sequences
    • Toggle between Radial and not


Q5. What does this tree tell us about the phylogenetic relationship of HIV-1, HIV-2 and SIV?

Click here for answer
HV2 and SIVM are more similar while 
HV1 and SIVCZ form a cluster
This hints us how HIV was transmitted 
from monkeys to humans 


Multiple alignments - HIV drug resistance

A number of drugs against HIV have been developed. One example is AZT which acts as an inhibitor to the reverse transcriptase (RT) encoded by the HIV genome. AZT binds to the active site of the RT and as a result blocks its polymerase activity. However, the mutation frequency of the HIV genome is very high, and resistance to AZT develops easily. This typically occurs by changing amino acids close to the active site so that the affinity for AZT is reduced.

  • Program: ClustalO, the web interface of a multiple sequence alignment program

  • Data: rt_isolates.fa - file contains amino acid sequences of the RT from AZT resistant as well as sensitive strains

  • Make a multiple alignment of the RT isolates

    • Go to the Tool output tab
    • There are 2 mutations that are responsible for the resistance to the treatment


Q6. What are these positions and what are the amino acid changes? Hint: Find two positions that have been mutated in all the AZT resistant strains but not in the sensitive strain.

Click here for answer
67 N -> D
70 R -> K


HIV-1 RT structure

As a reminder here are the main treatment actions that are taken against HIV:

Treatment Visualization
Blockage of the entry to the host cell by fusion inhibitors
Inhibition of reverse transcriptase by nulceoside inhibitors
Inhibition of reverse transcriptase by non-nucleoside inhibitors
Block of the integrase
Inhibition of the protease

Focusing on the Reverse Transcriptase (RT), let’s identify the key elements that are targeted to generate treatments against an HIV infection and understand how the HIV virus responds by creating resistance to these drugs.

  • Program: iCN3D - Web-based 3D Structure Viewer
  • Data: 1RTD - X-ray crystallography structure of HIV-1 RT in complex with DNA

If you have time:

  • Open the structure link in a separate window
    • This will take you to the Structure database at NCBI
    • Focus on the Molecular Graphic window
    • Click on full feature 3D viewer

Under Sequences and Annotations, you will see all the different molecules of 1RTD:

Two proteins: * chainA in light gray (1RTD_A). * chainB in yellow (1RTD_B).

Two nucleotide sequences: * the DNA-RNA complex, in blue and pink respectively (1RTD_E and 1RTD_F)

And some chemicals: * Four magnesium ions in green (1RTD_MG, 1RTD_MG2,1RTD_MG2 and 1RTD_MG4) that are needed to stabilize the structural conformation * A thyamine (T), that will be incorporated to the DNA by this machinery

Look at the structure from different angles by draging the mouse while pressing its left button.

Let’s clean the structure. Select 1RTD_A in the right panel and click on:

Style -> Proteins -> Hide

Do the same with 1RTD_B. You now can easily see the DNA-RNA complex together with the MG and T molecules.

Let’s put back chainA. Select 1RTD_A and then:

Style -> Protein -> Ribbon

You can see how the DNA-RNA complex is sitting along the protein guiding it to incorporate the thyamine (T). Let’s focus on the “hand”. Select 1RTD_A from the Sequences and Annotations window, then click on:

Style -> Protein -> Hide

To highlight the “hand”:

Select -> Advanced

In the new window fill in with the following values:

Select: .A:1-324
Name:   polymerase 

Click on Save Selection to Defined Sets. To view our defined sets, click Select -> Defined Sets, a new window will appear with all the different molecules. Scroll down and select polymerase, then:

Style -> Protein -> Ribbon
Color -> Unicolor -> Red

To highlight the “Thumb”, under the Select -> Advanced, create a new selection:

Select: .A:245-324
Name:   thumb

Click on Save Selection to Defined Sets. Under the Select sets window, select thumb and then:

Color -> Unicolor -> Yellow

Let’s add some catalytical aspartates, that are critical for the polymerase function. Under the Select -> Advanced, create a new selection:

Select: .A:110,185,186
Name:   aspartates

Click on Save Selection to Defined Sets. Under the Select sets window, select aspartates and then:

Style -> Protein -> Sphere
Color -> Unicolor -> Cyan

And finally, a key tyrosine that stabilizes the template-primer with a hydrogen bond. Under the Select -> Advanced, create a new selection:

Select: .A:183
Name:   tyrosine

Click on Save Selection to Defined Sets. Under the Select sets window, select tyrosine and then:

Style -> Protein -> Sphere
Color -> Unicolor -> Gray


If you are short in time:


  • Rotate the figure until you see the hand and the key elements (aspartates, tyrosine, magnesium ions) as in this figure

In a previous exercise, you aligned RT sequences from AZT resistant strains using Clustal Omega. You then identified two residues that are mutated in all three AZT resistant isolates.

  • Highlight these positions in the structure
Select -> Advanced
  • In the new window fill in with the following values (X and Y are the positions of the mutations):
Select: .A:X,Y
Name:   mutations 
  • Click on Save Selection to Defined Sets
  • Under the Select sets window, select mutations and then:
Style -> Protein -> Sphere
Color -> Unicolor -> A color of your choosing
Description Figure
Chain A + Chain B
Chain A + DNA/RNA duplex
DNA/RNA duplex + T + Mg ions
Hand in red
Thumb in yellow
Aspartates in cyan
Tyrosine in gray
Mutations in pink
Focusing on the hand and the DNA/RNA duplex
Rotating the structure
Focusing only on the hand


Q7. How could these mutations interfere in the treatment of HIV?

Click here for answer
When these positions change, 
the drug doesn't bind anymore and 
it stops blocking the transcription of the virus